Information processing apparatus, information processing method, and storage medium

ABSTRACT

An information processing apparatus inputs a document having a plurality of input items, and displays it using an information display unit. An active input item is discriminated from the plurality of input items in accordance with the display state of the document. A specific grammar corresponding to the discriminated active input item is selected from a grammar holding unit for holding a plurality of types of grammars, and the selected grammar is used in a speech recognition process.

TECHNICAL FIELD

[0001] The present invention relates to an information processingapparatus, information processing method, and storage medium and, moreparticularly, to an apparatus and method for executing an informationprocess by combining a speech input and GUI.

BACKGROUND ART

[0002] Along with the advance of speech recognition-synthesistechniques, an information input apparatus by means of speech has beenput into practical applications. Furthermore, an information inputapparatus that combines speech and another means is also available. Insuch apparatus, respective means can compensate for each other'sdemerits, and can exploit each other's merits.

[0003] As such apparatus, an interface apparatus that combines a speechinput and GUI is known. By inputting information while exploiting themerits of the speech input and GUI, their demerits are compensated for.

[0004] More specifically, speech is natural interface means for a humanbeing, and is easy to input/output as a merit, but has no browsabilityas a demerit. On the other hand, as the GUI has browsability as outputmeans, it allows easy input of, e.g., menu selection as input means forbrowsably displaying input items (input fields). However, a demerit ofthe GUI is difficult free input (especially, the demerit is conspicuousin case of ten-key input and handwriting input).

[0005] For example, a music search system having an interface shown inFIG. 8 will be described below. This system can search for a song basedon one or a plurality of an artist name, a song name, and a name of CMusing that song. The GUI (screen display) is used as output means, andspeech is used as input means to respective input items.

[0006] In this case, since screen display is made, the user can easilyunderstand that he or she can make a search using any of the artistname, song name, and CM name. Since input can be made to the respectiveinput fields by means of speech, it is easy to input.

[0007] Speech contents input to the respective input fields arerecognized using different grammars. For example, the artist name, songname, and CM name are respectively recognized using the grammars of theCM name.

[0008] When the speech input and GUI are used together, and there are aplurality of input fields, as shown in FIG. 8, an input fieldcorresponding to a given speech input must be discriminated.

[0009] As a method for this purpose, speech recognition is madesimultaneously using the grammars for all the input fields, and an inputfield corresponding to the input is determined based on the obtainedrecognition result.

[0010] In the example shown in FIG. 8, speech recognition is madesimultaneously using the grammars for the artist name, song name, and CMname, and if the recognition result indicates a CM name, an input to theCM name input field can be determined.

[0011] Note that the speech recognition rate normally lowers as thegrammar becomes larger in scale. Hence, when the grammars for theplurality of input fields are simultaneously used, the recognition ratefor the speech input lowers.

DISCLOSURE OF INVENTION

[0012] The present invention has been made in consideration of theaforementioned problems, and has as its object to improve therecognition rate of speech input by preventing an increase in grammarscale used in speech recognition even when a plurality of input fieldsare available.

[0013] In order to achieve the above object, an information processingapparatus according to the present invention comprises the followingarrangement.

[0014] That is, an image processing apparatus comprises:

[0015] input means for inputting a document having a plurality of inputitems;

[0016] discrimination means for discriminating an active input item fromthe plurality of input items in accordance with a display state of thedocument; and

[0017] selection means for selecting a specific grammar corresponding tothe active input item discriminated by the discrimination means.

[0018] In order to achieve the above object, an information processingapparatus according to another aspect of the present invention comprisesthe following arrangement. That is, an information processing apparatuscomprises:

[0019] input means for inputting a document having a plurality of inputitems;

[0020] judge means for judging whether or not the document containsdesignation for selecting a specific grammar in accordance with adisplay state of the document; and

[0021] control means for controlling selection of a grammar according toa judgement result.

[0022] In order to achieve the above object, an information processingmethod according to the present invention comprises:

[0023] the input step of inputting a document having a plurality ofinput items;

[0024] the discrimination step of discriminating an active input itemfrom the plurality of input items in accordance with a display state ofthe document; and

[0025] the selection step of selecting a specific grammar correspondingto the active input item discriminated in the discrimination step.

[0026] Furthermore, in order to achieve the above object, an informationprocessing method according to the present invention comprises:

[0027] the input step of inputting a document having a plurality ofinput items;

[0028] the judge step of judging whether or not the document containsdesignation for selecting a specific grammar in accordance with adisplay state of the document; and

[0029] the control step of controlling selection of a grammar accordingto a judgement result.

[0030] Also, according to the present invention, a control program formaking a computer execute the information processing method, a computerreadable medium that stores the control program, and a computer programproduct are provided.

BRIEF DESCRIPTION OF DRAWINGS

[0031]FIG. 1 is a block diagram showing a basic arrangement of a speechinterface apparatus according to the first embodiment of the presentinvention;

[0032]FIG. 2 is a block diagram showing a practical hardware arrangementof the speech interface apparatus according to the first embodiment;

[0033]FIG. 3 is a flow chart showing an outline of the processingsequence in the first embodiment;

[0034]FIG. 4 is a table showing an example of the data format of a fieldinformation holding unit;

[0035]FIG. 5 is a table showing an example of the data format of agrammar holding unit;

[0036]FIG. 6 is a block diagram showing a basic arrangement of a speechinterface apparatus according to the second embodiment of the presentinvention;

[0037]FIG. 7 is a flow chart showing an outline of the processingsequence in the second embodiment;

[0038]FIG. 8 shows an example of an input screen;

[0039]FIG. 9 is a view for explaining a displayed portion andnon-displayed portions on the input screen;

[0040]FIG. 10 shows an example of expression by means of a hypertextdocument;

[0041]FIG. 11 shows a practical display example of the input screenusing the hypertext document shown in FIG. 10;

[0042]FIG. 12 shows a practical display example of the input screen; and

[0043]FIG. 13 is a flow chart showing an outline of the processingsequence in the second embodiment when the hypertext document shown inFIG. 10 is used.

BEST MODE FOR CARRYING OUT THE INVENTION First Embodiment

[0044] The present invention will be described in detail hereinafterwith reference to the accompanying drawings.

[0045]FIG. 1 is a block diagram showing a basic arrangement of anapparatus according to the first embodiment of an information inputapparatus, information input method, and storage medium of the presentinvention.

[0046] Referring to FIG. 1, reference numeral 101 denotes an informationdisplay unit. The information display unit 101 also displays informationof input fields (input items). Reference numeral 102 denotes a fieldselection unit for selecting one of the input fields displayed on theinformation display unit 101. Reference numeral 103 denotes an inputdetection unit for detecting if a signal indicating selection of a giveninput field is received from the field selection unit 102.

[0047] Reference numeral 104 denotes a field determination unit fordetermining a selected input field on the basis of a select signal sentfrom the field selection unit 102 via the input detection unit 103. Notethat an input field selected by the field selection unit 102 will bereferred to as an active field hereinafter.

[0048] Reference numeral 105 denotes a field switching unit forswitching an active field on the basis of the determination result ofthe field determination unit 104. Reference numeral 106 denotes a fieldinformation holding unit for holding information for all the inputfields in the currently displayed contents. The contents of the fieldinformation holding unit 106 are as shown in, e.g., FIG. 4.

[0049] More specifically, as shown in FIG. 4, numbers are assigned toinput fields, and the field information holding unit holds the inputfield numbers, their values (no values are set in a default state), andIDs of grammars used in speech recognition of the corresponding inputfields.

[0050] Reference numeral 107 denotes an active field holding unit forholding an active field. Reference numeral 108 denotes a grammarswitching unit for switching a grammar on the basis of the determinationresult of the field determination unit 105. Note that the grammarselected by the grammar switching unit 108 will be referred to as anactive grammar hereinafter.

[0051] Reference numeral 109 denotes a grammar holding unit for holdingall grammars used in speech input in the contents currently displayed onthe information display unit 101. The contents of the grammar holdingunit 109 include grammar IDs and description of the grammars, as shownin, e.g., FIG. 5. Information associated with a grammar to be used isdescribed in the displayed contents (this will be described later withreference to FIG. 10). Assume that the grammar itself can be acquiredfrom a disk device (not shown) and a server (not shown) on the network.

[0052] Reference numeral 110 denotes an active grammar holding unit forholding the ID of an active grammar. Reference numeral 111 denotes aspeech input unit for inputting speech. Reference numeral 112 denotes aspeech recognition unit for recognizing speech input from the speechinput unit 111 using the grammar held in the active grammar holding unit110. Reference numeral 113 denotes a recognition result holding unit forholding the recognition result of the speech recognition unit 112.

[0053]FIG. 2 is a block diagram showing a practical hardware arrangementof the speech input apparatus of this embodiment.

[0054] Referring to FIG. 2, reference numeral 201 denotes a CPU whichoperates according to a program that implements the sequence to bedescribed later. Reference numeral 202 denotes a memory which providesthe field information holding unit 106, the active field holding unit107, the grammar holding unit 109, the active grammar holding unit 110,the recognition result holding unit 113, and a storage area required foroperation for executing the program.

[0055] Reference numeral 203 denotes a control memory for holding theprogram that implements the sequence to be described later. Referencenumeral 204 denotes a pointing device which forms the aforementionedfield selection unit 102. Reference numeral 205 denotes a display whichforms the information display unit 101. Reference numeral 206 denotes amicrophone which forms the speech input unit 111. Reference numeral 207denotes a bus which connects the respective building components.

[0056] The operation of the apparatus of this embodiment will beexplained below with reference to the flow chart shown in FIG. 3. In thefollowing description, a case will be exemplified wherein a mouse isused as the pointing device 204.

[0057] When given contents are displayed, grammars used in speechrecognition of respective input fields of the contents are loaded intothe grammar holding unit 109, and the correspondence between the inputfields and grammar ID is stored in the field information holding unit106.

[0058] The input detection unit 103 checks in the first step S301 if aninput from the mouse is detected. The mouse input may be recognized bydetecting either a mouse click or stay of a mouse cursor on a givenobject for a predetermined period of time or more. This step is repeateduntil an input is detected. If an input is detected, the flow advancesto step S302.

[0059] It is checked in step S302 if the input detected in step S301 isthat for selecting an input field. If it is determined as a result ofchecking that the input is not selection of an input field, the flowreturns to step S301. If the input is selection of an input field, theflow advances to step S303.

[0060] The field determination unit 104 checks in step S303 which inputfield is selected. The field switching unit 105 stores the selectedinput field in the active field holding unit 107.

[0061] In step S304, the grammar switching unit 108 stores an activegrammar in the active grammar holding unit 110. Note that the activegrammar is that corresponding to the input field held in the activefield holding unit 107 of the grammars held in the grammar holding unit109. The field information holding unit 106 checks the grammar IDcorresponding to the current active field, and reads out a grammarcorresponding to that grammar ID from the grammar holding unit 109.

[0062] It is checked in step S305 if speech is input from the speechinput unit 111. This step is repeated until speech is input. If speechis input, the flow advances to step S306.

[0063] In step S306, the speech recognition unit 112 executes arecognition process of speech input in step S305 using the grammar heldin the active grammar holding unit 110. The speech recognition result isheld in the recognition result holding unit 113.

[0064] In step S307, the result held in the recognition result holdingunit 113 is held in the field information holding unit 106. That is, inFIG. 4, a column of “value” corresponding to the active field holds therecognition result.

[0065] In step S308, the information display unit 101 displays theresult held in the recognition result holding unit 113 in the inputfield held in the active field holding unit 107. In this way, theprocessing ends.

[0066] The processing contents will be described in detail below takingcontents shown in FIG. 8 as an example. In FIG. 8, the numbers of first,second, and third input fields 801, 802, and 803 are respectively 1, 2,and 3.

[0067] Also, if grammars for the artist name, song name, and CM name arerespectively first, second, and third grammars A, B, and C, the contentsof the field information holding unit 109 are as shown in FIG. 4. Wheninput speech is recognized according to the prior art, all threegrammars, i.e., first, second, and third grammars A, B, and C are used.

[0068] On the other hand, in this embodiment, if, for example, the firstinput field 801 is selected by the mouse, input speech is recognizedusing only first grammar A corresponding to first input field 1. In thisway, since the scale of a grammar used to recognize input speech can berelatively smaller than the prior art, the recognition rate of inputspeech can be greatly improved.

[0069] Likewise, if the second input field 802 is selected by the mouse,input speech is recognized using second grammar B corresponding tosecond input field 2; if the third input field 803 is selected, inputspeech is recognized using only third grammar C corresponding to thirdinput field 3.

[0070] In the above embodiment, the mouse is used as means that formsthe field selection unit 102, but another means may be used. Forexample, a display of the information display unit 101 may have a touchpanel, and a desired field may be designated by a pen or finger.

[0071] Also, an n-th input field (arbitrary input field) may bedesignated using a ten-key pad. That is, a desired input field may bedesignated by a numerical value input. Furthermore, an input field maybe designated by the direction of line of sight using a line of sightinput device.

[0072] Or speech input objects (buttons, icons, images, or the like)having one-to-one correspondence with input fields may be displayed andone of these objects may be selected to select an input field.

[0073] In the above embodiment, the grammar to be used in speechrecognition is switched when an input field is selected. Alternatively,an active input field may be discriminated at the time of input ofspeech so as to select a grammar.

[0074] Moreover, in the above embodiment, the start and end of speechinput may be designated by selection operation of an input field. Forexample, the selection timing of an input field is processed as a speechinput start timing, and the selection end timing of the input field isprocessed as a speech input end timing. For example, speech input iscaptured while a mouse pointer operated by the mouse stays on a giveninput field.

[0075] In the above embodiment, the GUI is used as output means, andspeech input is used as input means. However, the present invention isnot limited to these specific means. For example, the GUI may be usedtogether in input means, and speech may be used together in outputmeans.

Second Embodiment

[0076] The second embodiment of the present invention will be describedin detail below with reference to the accompanying drawings.

[0077]FIG. 6 is a block diagram showing a basic arrangement of anapparatus according to the second embodiment of the present invention.

[0078] Referring to FIG. 6, reference numeral 601 denotes an informationdisplay unit. The information display unit 601 also displays informationof input fields.

[0079] Reference numeral 602 denotes a display content holding unit forholding contents actually displayed on the information display unit 601.Reference numeral 603 denotes a display information switching unit forswitching information to be displayed on the information display unit601. Especially, if the information display unit has a small size, itcannot display all contents at one time. In such case, by switchingdisplay information, the remaining contents are displayed in turn. Forexample, this operation is implemented by page switching, scrolling, orthe like.

[0080] Reference numeral 604 denotes a field determination unit fordetermining an input field actually displayed on the information displayunit 601. An input field displayed on the information display unit 601will be referred to as an active field. This embodiment assumes that thenumber of active fields is not limited to one unlike in the firstembodiment.

[0081] Reference numeral 605 denotes a field switching unit forswitching an active field on the basis of the determination result ofthe field determination unit 604. Reference numeral 606 denotes a fieldinformation holding unit for holding information for all the inputfields in the currently displayed contents. The contents of the fieldinformation holding unit 606 are as shown in, e.g., FIG. 4.

[0082] More specifically, numbers are assigned to input fields, and thefield information holding unit holds the input fields numbers, theirvalues (no values are set in a default state), and IDs of grammars usedin speech recognition of the corresponding input fields.

[0083] Reference numeral 607 denotes an active field holding unit forholding an active field. Reference numeral 608 denotes a grammarswitching unit for switching a grammar on the basis of the determinationresult of the field determination unit 604. Note that the grammarselected by the grammar switching unit 608 will be referred to as anactive grammar. Reference numeral 609 denotes a grammar holding unit forholding all grammars that can be used to recognize input speech in thecontents currently displayed on the information display unit 601. Thecontents of the grammar holding unit 609 include grammar IDs anddescription of the grammars, as shown in, e.g., FIG. 5. Informationassociated with a grammar to be used is described in the displayedcontents. Assume that the grammar itself can be acquired from a diskdevice (not shown) and a server (not shown) on the network.

[0084] Reference numeral 610 denotes an active grammar holding unit forholding the ID of an active grammar. Reference numeral 611 denotes aspeech input unit for inputting speech. Reference numeral 612 denotes aspeech recognition unit for recognizing speech input from the speechinput unit 611 using the grammar held in the active grammar holding unit610. Reference numeral 613 denotes a recognition result holding unit forholding the recognition result of the speech recognition unit 612.

[0085] Since a practical arrangement of the speech input apparatus ofthe second embodiment is the same as that of the first embodiment shownin FIG. 2, it will be explained using FIG. 2 common to the firstembodiment.

[0086] Referring to FIG. 2, reference numeral 201 denotes a CPU whichoperates according to a program that implements the sequence to bedescribed later. Reference numeral 202 denotes a memory which providesthe display content holding unit 602, the field information holding unit606, the active field holding unit 607, the grammar holding unit 609,the active grammar holding unit 610, the recognition result holding unit613, and a storage area required for operation for executing theprogram.

[0087] Reference numeral 203 denotes a control memory for holding theprogram that implements the sequence to be described later. Referencenumeral 204 denotes a pointing device which implements the displaycontent switching unit 603. Reference numeral 205 denotes a displaywhich implements the information display unit 601. Reference numeral 206denotes a microphone which implements the speech input unit 611.Reference numeral 207 denotes a bus which connects the respectivebuilding components.

[0088] The operation of the information input apparatus of the secondembodiment will be explained below with reference to the flow chartshown in FIG. 7.

[0089] When given contents are displayed, grammars used in speechrecognition of respective input fields of the contents are loaded intothe grammar holding unit 609, and the correspondence between the inputfields and grammar ID is stored in the field information holding unit606.

[0090] It is checked in the first step S701 if speech is input from thespeech input unit 611. This step 701 is repeated until speech is input,and if speech is input, the flow advances to step S702.

[0091] It is checked in step S702 based on the contents of the displaycontent holding unit 602 which input fields are actually currentlydisplayed.

[0092] In step S703, the field switching unit 605 stores the currentlydisplayed input fields in the active field holding unit 607.

[0093] In step S704, the grammar switching unit 608 stores activegrammars in the active grammar holding unit 610. Note that the activegrammar is one corresponding to the input field held in the active fieldholding unit 607 of the grammars held in the grammar holding unit 609.The field information holding unit 606 checks the grammar IDcorresponding to the current active field, and reads out a grammarcorresponding to that grammar ID from the grammar holding unit 609.

[0094] In step S705, the speech recognition unit 612 executes arecognition process of speech input in step S701 using the grammars heldin the active grammar holding unit 610. Assume that the recognitionprocess returns a recognition result and the ID of the grammar used inrecognition. More specifically, the grammars corresponding to aplurality of types of grammar IDs are used. The recognition results areobtained for respective grammar IDs, and a candidate with highestsimilarity is output together with its grammar ID. The recognitionresult is held in the recognition result holding unit 613.

[0095] In step S706, an input field to which the input was made isdetermined based on the grammar ID obtained in step S705. Since thecorrespondence between the grammar IDs and input fields is stored in thefield information holding unit 606, its contents can be looked up. Forexample, assume that the field information holding unit 606 has contentsshown in FIG. 4, and active fields are “1” and “3”. If third grammar Cis returned as the grammar ID together with the recognition result, itis determined that this input was made for third input field 3corresponding to third grammar C.

[0096] In step S707, the result held in the recognition result holdingunit 613 is held in the field information holding unit 606. That is, inFIG. 4, a column of “value” corresponding to the active field holds therecognition result. In step S708, the information display unit 601displays the result held in the recognition result holding unit 613 inthe input field determined in step S706. In this way, the processingends.

[0097] The processing contents will be described in detail below takingcontents shown in FIG. 8 as an example.

[0098] Assume that the contents shown in FIG. 8 are displayed, as shownin FIG. 9. In FIG. 9, reference numeral 904 denotes an actuallydisplayed portion; 905, a non-displayed portion.

[0099] Assume that the numbers of first, second, and third input fields901, 902, and 903 are respectively 1, 2, and 3. If grammars for theartist name, song name, and CM name are respectively first, second, andthird grammars A, B, and C, the contents of the field informationholding unit 609 are as shown in FIG. 4.

[0100] When input speech is recognized according to the prior art, threegrammars, i.e., first, second, and third grammars A, B, and C are used.On the other hand, in this embodiment, since an actually displayed inputfield is only 901, input speech is recognized using grammar A alonecorresponding to input field 1. In this way, since the scale of agrammar used to recognize input speech can be relatively smaller thanthe prior art, the recognition rate of input speech can be greatlyimproved.

[0101] Even when a plurality of input fields are displayed, since thegrammars used in speech recognition are limited to those correspondingto the displayed input field, high recognition precision can bemaintained. Upon applying the recognition result to one of a pluralityof active fields, the input field as an application destination isdetermined based on the grammar used in speech recognition. For thisreason, even when a plurality of input fields are displayed, a value(speech recognition result) is automatically set in an appropriate inputfield, thus improving operability.

Third Embodiment

[0102] The third embodiment of the present invention will be describedin detail below with reference to the accompanying drawings.

[0103] Since the basic arrangement and hardware arrangement of theapparatus according to the third embodiment are the same as those of thesecond embodiment shown in FIGS. 6 and 2, a detailed description thereofwill be omitted.

[0104] The third embodiment describes the contents in the secondembodiment using hypertext, and executes a different process for thecontents unlike in the second embodiment. The third embodiment will bedescribed in detail below with reference to FIGS. 10 to 13.

[0105]FIG. 10 shows an example of contents expressed by a hypertextdocument. The hypertext document is held in the display contents holdingunit 602, and is displayed by the information display unit 601, as shownin FIG. 11.

[0106] A tag 101 in FIG. 10 indicates designation of grammar-displaylink, i.e., whether or not a grammar is switched in correspondence withswitching of a display screen. If this tag is described, a process forswitching a grammar in synchronism with a change in display is done;otherwise, a process for inhibiting switching of a grammar insynchronism with a change in display is done. Details of this processwill be described later using the flow chart shown in FIG. 13.

[0107] Reference numeral 102 denotes a description of the type of datato be input to an input field “artist name”, the size of that inputfield, and position information ([http://temp/art.grm]) of a grammarused in the input field. Likewise, reference numerals 103, 104, 105, and106 denote descriptions of information that pertains to respective inputfields, and position information of grammars stored in correspondencewith these fields.

[0108]FIG. 11 shows the display state of hypertext shown in FIG. 10 onthe information display unit 601.

[0109] Contents shown in FIG. 11 include four input fields (1001, 1002,1003, and 1004). If a display screen is sufficiently large, all the fourinput fields are displayed within one screen; if a display screen issmall, some of the four input fields are displayed, as shown in FIG. 12.In FIG. 12, the two input fields 1002 and 1003 are displayed. In thiscase, by changing the display state of the screen by, e.g., verticallyscrolling the screen using a scroll bar, non-displayed fields can beconfirmed.

[0110] The operation of the information input apparatus of thisembodiment will be described below with reference to FIG. 13.

[0111] In step S801, hypertext shown in FIG. 10 is read. In step S802,hypertext read in step S801 is analyzed, and a GUI shown in FIG. 11 isdisplayed based on the analysis result. The position of each grammar,e.g., [http://temp/art.grm], is detected based on the analysis result.Also, the contents of a tag, e.g., whether or not a <form> tag containsan entry [grmselect=“display”], and the like are analyzed.

[0112] In step S803, grammars are read based on the grammar positioninformation detected in step S802, and four grammars corresponding tothe artist name, song name, CM name, and rank name are held in thegrammar holding unit 609. In step S804, field information, i.e., thecorrespondence between the input fields and grammars, is held in thefield information holding unit 606 on the basis of the analysis resultin step S802. In this example, grammars http://temp/art.grm,http://temp/kyoku.grm, http://temp/cm.grm, and http://temp/rank.grm areheld in correspondence with the input fields 1001, 1002, 1003, and 1004,respectively.

[0113] In step S805, speech input is detected. If speech input isdetected, the flow advances to step S806. It is checked in step S806based on the analysis result in step S802 if the <form> tag contains anentry [gramselect=“display”], i.e., a grammar to be used is selected insynchronism with a change in display. If that entry is found, the flowadvances to step S808; otherwise, the flow advances to step S807.

[0114] If no entry is found, all grammars are set as active grammars instep S807. That is, the four grammars are held in the active grammarholding unit 610, and the flow advances to the speech recognitionprocess in step S811.

[0115] If an entry is found, it is checked in step S808 which inputfields are currently actually displayed. In step S809, the currentlydisplayed input fields are held in the active field holding unit 607. Instep S810, the grammars corresponding to the input fields held in theactive field holding unit 607 of the four grammars held in step S803 areheld as active grammars in the active grammar holding unit 610. In FIG.12, two out of the four fields, i.e., the input fields 1002 and 1003 aredisplayed. The grammars corresponding to these two input fields arehttp://temp/kyoku.grm and http://temp/cm.grm, and these two grammars areheld as active grammars.

[0116] In step S811, a recognition process of input speech is executedusing the grammars held as active grammars in step S807 or S810. Assumethat the recognition process returns a recognition result and the ID ofthe grammar used in recognition. The recognition result and the ID ofthe grammar used in recognition are held in the recognition resultholding unit 613.

[0117] In step S812, an input field to which the input was made isdetermined based on the grammar ID obtained in step S811. Since thecorrespondence between the grammar IDs and input fields is held in thefield information holding unit 606, it is looked up.

[0118] In step S813, the recognition result held in the result holdingunit 613 is held in the field information holding unit 606. Morespecifically, the recognition result is held in a column of value inFIG. 4.

[0119] In step S814, the information display unit displays the resultheld in the recognition result holding unit 613 in the input fielddetermined in step S812.

[0120] After that, if the user instructs to submit the recognitionresult displayed in the input field by pressing, e.g., a search buttonor the like, the recognition result is submitted to an application,which operates according to that result.

[0121] For example, when the user presses a search button while anartist name (recognition result) is displayed in the column 801 ofartist name in FIG. 8, the displayed artist name or the like issubmitted to an application, and a search result using the artist namecan be obtained.

[0122] In this manner, the processing ends.

[0123] When grmselect=“display” is set in hypertext shown in FIG. 10,and display is made, as shown in FIG. 11, input can be made to the fourinput fields, and when display is made, as shown in FIG. 12, input canbe made to only the two input fields. When grmselect=“display” is notset, input can be made to the four input fields independently of whetheror not the corresponding field is displayed.

[0124] According to this embodiment, when input items to be displayedinclude an item corresponding to a complex grammar, display iscontrolled not to display that item, thus limiting the input fields, andimproving the recognition rate.

[0125] According to this embodiment, since a speech recognition processis done using only grammars corresponding to the actually displayedinput fields, the scale of grammars can be reduced and, hence, therecognition rate of input speech can be improved.

[0126] According to this embodiment, the input fields are limited inaccordance with the presence/absence of a tag indicating whether or nota grammar is switched in response to switching of the display screen.However, the present invention is not limited to this. For example, theinput fields may be limited in accordance with description contents in atag. More specifically, if grmselect=“none” is set in a tag, allgrammars may be used; when grmselect=“display” is set, the grammars tobe used can be limited in synchronism with a change in display. In thiscase, if no tag is set, recognition may be inhibited.

[0127] According to this embodiment, the currently displayed inputfields are handled as active fields. However, the present invention isnot limited to this. When a frame of an HTML document is used, or when aplurality of windows are used, input fields present on the currentlyactive frame (the frame means a partitioned area on a web page, and adocument can be scrolled in each area) or window may be handled asactive fields.

[0128] According to this embodiment, the GUI is used as output means,and speech input is used as input means. However, the present inventionis not limited to these specific means. For example, the GUI may be usedtogether in input means, and speech may be used together in outputmeans.

[0129] When a tag indicating whether or not a grammar is switched inresponse with switching of the display screen is set, the user may beinformed of that. More specifically, an indicator or the like on the GUImay be provided. With this arrangement, the user can recognize inadvance whether all grammars indicated by input fields are selected orspecific grammars indicated by display input fields are selected, thusimproving the operability of this information processing apparatus.

[0130] The aforementioned embodiments may be applied to either a systemconsisting of a plurality of devices or an apparatus consisting of asingle device.

[0131] As a recording medium that stores a program code of a controlprogram for implementing the functions of the aforementionedembodiments, for example, a floppy disk, hard disk, optical disk,magneto-optical disk, CD-ROM, magnetic tape, nonvolatile memory card,ROM, and the like may be used.

[0132] The program code is included in the embodiments of the presentinvention when the functions of the aforementioned embodiments areimplemented by collaboration of the program code of the control programand an OS (operating system), another application software, or the like,which is running on a central processing unit 2.

[0133] Furthermore, the present invention includes a case wherein thefunctions of the aforementioned embodiments are implemented by some orall of actual processing operations executed by a CPU or the likearranged in a function extension board or a function extension unit,after the supplied program code is stored in a memory of the extensionboard or unit.

[0134] As described above, according to the present invention, sincespeech recognition can be done in accordance with the display states ofinput items, the recognition rate of input speech can be improved.

1. An information processing apparatus characterized by comprising:input means for inputting a document having a plurality of input items;discrimination means for discriminating an active input item from theplurality of input items in accordance with a display state of thedocument; and selection means for selecting a specific grammarcorresponding to the active input item discriminated by saiddiscrimination means.
 2. The information processing apparatus accordingto claim 1, characterized in that said discrimination meansdiscriminates an input item displayed on a display screen as the activeinput item.
 3. The information processing apparatus according to claim1, characterized by further comprising: speech input means for inputtingspeech; and speech recognition means for recognizing speech input bysaid speech input means using the grammar selected by said selectionmeans.
 4. The information processing apparatus according to claim 3,characterized by further comprising: determination means for determiningan input item to which a recognition result of said speech recognitionmeans is to be input; and control means for controlling to input therecognition result to the input item specified by said determinationmeans.
 5. The information processing apparatus according to claim 1,characterized by further comprising: display switching means forswitching displayed contents on a display screen, and in that when thedisplayed contents are switched by said display switching means, saiddiscrimination means discriminates an input item displayed on thedisplay screen as the active input item.
 6. The information processingapparatus according to claim 5, characterized in that said displayswitching means scrolls the display screen.
 7. The informationprocessing apparatus according to claim 5, characterized in that saiddisplay switching means switches a frame.
 8. An information processingapparatus characterized by comprising: input means for inputting adocument having a plurality of input items; judge means for judgingwhether or not the document contains designation for selecting aspecific grammar in accordance with a display state of the document; andcontrol means for controlling selection of a grammar according to ajudgement result.
 9. The information processing apparatus according toclaim 8, characterized in that when said judge means judges that thedocument contains designation for selecting a specific grammar inaccordance with the display state of the document, said control meansselects a specific grammar corresponding to an input item displayed on adisplay screen.
 10. The information processing apparatus according toclaim 8, characterized in that when said judge means judges that thedocument does not contain any designation for selecting a specificgrammar in accordance with the display state of the document, saidcontrol means selects all grammars corresponding to the input items inthe document.
 11. The information processing apparatus according toclaim 8, characterized by further comprising: speech input means forinputting speech; and speech recognition means for recognizing speechinput by said speech input means using the grammar selected by saidcontrol means.
 12. The information processing apparatus according toclaim 11, characterized by further comprising: determination means fordetermining an input item to which a recognition result of said speechrecognition means is to be input; and control means for controlling toinput the recognition result to the input item specified by saiddetermination means.
 13. The information processing apparatus accordingto claim 8, characterized by further comprising presentation means for,when said judge means judges that the document contains designation forselecting a specific grammar in accordance with the display state of thedocument, presenting a message indicating this.
 14. An informationprocessing method characterized by comprising: the input step ofinputting a document having a plurality of input items; thediscrimination step of discriminating an active input item from theplurality of input items in accordance with a display state of thedocument; and the selection step of selecting a specific grammarcorresponding to the active input item discriminated in thediscrimination step.
 15. The information processing method according toclaim 14, characterized in that the discrimination step includes thestep of discriminating an input item displayed on a display screen asthe active input item.
 16. The information processing method accordingto claim 14, characterized by further comprising: the speech input stepof inputting speech; and the speech recognition step of recognizingspeech input in the speech input step using the grammar selected in theselection step.
 17. The information processing method according to claim16, characterized by further comprising: the determination step ofdetermining an input item to which a recognition result in the speechrecognition step is to be input; and the control step of controlling toinput the recognition result to the input item specified in thedetermination step.
 18. The information processing method according toclaim 14, characterized by further comprising: the display switchingstep of switching displayed contents on a display screen, and in thatwhen the displayed contents are switched in the display switching step,an input item displayed on the display screen is discriminated in thediscrimination step as the active input item.
 19. The informationprocessing method according to claim 18, characterized in that thedisplay switching step includes the step of scrolling the displayscreen.
 20. The information processing method according to claim 18,characterized in that the display switching step includes the step ofswitching a frame.
 21. An information processing method characterized bycomprising: the input step of inputting a document having a plurality ofinput items; the judge step of judging whether or not the documentcontains designation for selecting a specific grammar in accordance witha display state of the document; and the control step of controllingselection of a grammar according to a judgement result.
 22. Theinformation processing method according to claim 21, characterized inthat the control step includes the step of selecting, when it is judgedin the judge step that the document contains designation for selecting aspecific grammar in accordance with the display state of the document, aspecific grammar corresponding to an input item displayed on a displayscreen.
 23. The information processing method according to claim 21,characterized in that the control step includes the step of selecting,when it is judged in the judge step that the document does not containany designation for selecting a specific grammar in accordance with thedisplay state of the document, all grammars corresponding to the inputitems in the document.
 24. The information processing method accordingto claim 21, characterized by further comprising: the speech input stepof inputting speech; and the speech recognition step of recognizingspeech input in the speech input step using the grammar selected in thecontrol step.
 25. The information processing method according to claim24, characterized by further comprising: the determination step ofdetermining an input item to which a recognition result in the speechrecognition step is to be input; and the control step of controlling toinput the recognition result to the input item specified in thedetermination step.
 26. The information processing method according toclaim 21, characterized by further comprising the presentation step ofpresenting, when it is judged in the judge step that the documentcontains designation for selecting a specific grammar in accordance withthe display state of the document, a message indicating this.
 27. Acomputer readable medium that stores a control program for making acomputer implement an information process, said control programcharacterized by comprising: a code of the input step of inputting adocument having a plurality of input items; a code of the discriminationstep of discriminating an active input item from the plurality of inputitems in accordance with a display state of the document; and a code ofthe selection step of selecting a specific grammar corresponding to theactive input item discriminated in the discrimination step.
 28. Acomputer readable medium that stores a control program for making acomputer implement an information process, said control programcharacterized by comprising: a code of the input step of inputting adocument having a plurality of input items; a code of the judge step ofjudging whether or not the document contains designation for selecting aspecific grammar in accordance with a display state of the document; anda code of the control step of controlling selection of a grammaraccording to a judgement result.
 29. A control program for making acomputer implement an information process, characterized by comprising:a code of the input step of inputting a document having a plurality ofinput items; a code of the discrimination step of discriminating anactive input item from the plurality of input items in accordance with adisplay state of the document; and a code of the selection step ofselecting a specific grammar corresponding to the active input itemdiscriminated in the discrimination step.
 30. A control program formaking a computer implement an information process, characterized bycomprising: a code of the input step of inputting a document having aplurality of input items; a code of the judge step of judging whether ornot the document contains designation for selecting a specific grammarin accordance with a display state of the document; and a code of thecontrol step of controlling selection of a grammar according to ajudgement result.